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Abstract. VizieR is a database grouping in an homoge- 
neous way thousands of astronomical catalogues gathered 
since decades by the Centre de Donnees de Strasbourg 
(CDS) and participating institutes. The history and cur- 
rent status of this large collection is briefly presented, and 
the way these catalogues are being standardized to fit in 
the VizieR system is described. The architecture of the 
database is then presented, with emphasis on the man- 
agement of links and of accesses to very large catalogues. 
Several query interfaces are currently available, making 
use of the ASU protocol, for browsing purposes or for 
use by other data processing systems such as visualisa- 
tion tools. 
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1. Introduction 

The Centre de Donnees astronomiques de Strasbourg 
(CDS) has a very long experience in acquiring, cross- 
identifying, and distributing astronomical data (Genova 
et al. |200C| ): a collaboration for the exchange of what was 
called machine-readable astronomical data started with 
the NA SA -GSFC and the Astronomisches Rechen-Institut 
around 1970. This collaboration has been maintained over 
this 30 year period, and collaborations with other insti- 
tutes for similar exchanges have been developed. The vol- 
ume of data shared of course increased, at a rate which 
has been exploding in the recent years. 

Compared to the late 60 's, where the bulk of the 
machine-readable data consisted in a set of the basic cat- 
alogues carefully keypunched, the situation has changed 
drastically, now that every instrument or detector is gener- 
ating megabytes or gigabytes of daily output. These huge 
data sets are hopefully not stored in data centers, but 
are processed in the observing center where the exper- 
tise exists to generate the best high-quality archives and 



catalogues in a form usable by astronomers who are not 
familiar with the instrument. The Data Centers' role is 
essentially to collect such "final" catalogues, or more gen- 
erally high-quality data, i.e. data which either were pub- 
lished in the refereed scientific literature, or at least a pa- 
per describing these data and their context was accepted 
for publication in a refereed scientific journal. 

Making an efficient usage of the data distributed by the 
data centers — for instance for the analysis of the statis- 
tical properties of some interesting population of stars — 
often requires to combine data coming from several data 
sets; this operation is far from simple, and this is why the 
first creation of CDS was SIMBAD, a data-base resulting 
from the cross-identification of the major catalogues, later 
expanded to thousands of catalo gues and to published lit- 
erature (see Wenger et al. 2000 ). 

The VizieR system results from a different approach: 
the astronomical catalogues are kept in their original form, 
but homogeneous descriptions of all these data sets are 
provided in order to maximize their usability. In other 
words, VizieR relies on an homogenization of the catalogue 
descriptions — what is also called metadata, or data de- 
scribing other data — to transform the set of machine- 
readable astronomical catalogues into a set of machine-un- 
derstandable data. VizieR actually consists in an interface 
able to query this set of machine-understandable astro- 
nomical catalogues. 

2. Astronomical Catalogues 



Send offprint requests to: F. Ochsenbein (francois@ astro, u- 
strasbg.fr) 



Jaschek ( |1989| ) defined a catalogue as a long list of ordered 
data of a specific kind, collected for a particular purpose. 
What a long list means has evolved dramatically in the 
last decade: the new way of processing data actually re- 
sulted in a tremendous increase in both the number and 
the volume of the astronomical catalogues. To illustrate 
the evolution in the domain of catalogued surveys, one 
can remember that the largest catalogues in the beginning 
of this century, called the Durchmusterungen — the Bon- 
ner, Cordoba and Cape Durchmusterungen — provided 
only a position and a visual estimate of the brightness for 
~ 1.5 x 10 6 stars, and required over 50 years to be com- 
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Table 1. Evolution of the annual number of papers, and the percentage of papers with associated electronic data, for 
some of the main astronomical magazines 
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Table 2. Summary of the evolution of accessible digital catalogues in the last five years (number of catalogues and 
sizes in Mbytes). The last column gives the number of catalogues with a standardized description (see section ||). 



pleted. Today, a catalogue gathering similar parameters 
— with an accuracy one order of magnitude better — is 



well represented by the USNO-A2.0 flMonet 1998| ) which 
contains roughly 5 x 10 8 sources, almost three orders of 
magnitude larger. Even larger catalogues are being built: 
let us quote the GSC-II (Greene et al., 1998) which should 
contain all optical sources brighter than 18*' 1 magnitude, 
which can be estimated to about 2 x 10 9 objects. 

The existence of these new mega-catalogues — which 
are, in fact, rather giga- catalogues — does however not 
mean that the old catalogues can just be ignored: virtu- 
ally any astronomical object can be subject to variability 
maybe over periods of several centuries, and the discrep- 
ancies between old and newer results have therefore to be 
analyzed. 

Another important source of tabular material consists 
in tables published in the astronomical literature. These 
tables are now almost always originally in digital form, 
and contain highly processed data which usage can be 
precious; access to these electronic data is also essential 
for maintaining the large databases like Simbad or NED. 

The potential interest of the reusability of these ta- 
bles led the Editors of the leading astronomical journals 
to distribute the tabular material in electronic form. The 
first realisa tions for A&A started in 1993 (see Ochsenbein 
& Lequeux 1995 ) , and Table [l] summarizes the frequency 



of the availability of electronic tabular data among the 
publications in some of the main astronomical journals in 
the recent years: not surprisingly, the Supplement Series, 
which were created essentially for the presentation of the 
observational results, show a high rate of associated elec- 
tronic data. 



3. Astronomical Catalogues in the Data Centers 

3.1. Current Contents 

The growth of the collection of astronomical catalogues 
managed by data centers is illustrated by Table ^: the 
current set of available catalogues is now around 3,000, 
with an annual increase about 15%. Note that the entity 
designated as a "catalogue" can represent a table of about 
100 entries {e.g. the list of galactic globular clusters), as 
well as a multimillion source catalogue (e.g. the USNO- 
A2.0). 

In Table ||, the catalogues are grouped according to 
categories which were defined in the 70 's, when the bulk 
of astronomical studies were dealing with the properties 
of stars in the optical wavelength domain. Rather than 
defining regularly a new classification scheme following 
the evolution of the discipline, it was decided, in agree- 
ment with the other data centers, to assign designations 
to electronic tables according to the published paper, and 
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to reserve the assignment in the "traditional" categories to 
somewhat important catalogues or compilations. Simulta- 
neously, it was decided to assign keywords to each cata- 
logue, in order to allow easy retrieval of catalogues with 
similar contents and purposes. 

Note that, if most of the catalogues contain data re- 
lated to the observation of astronomical sources, other 
types of data are also available, generally grouped in the 
'Miscellaneous" (VI) category: catalogues of atomic data 
like wavelength tables or results of the Opacity Project, 
tabulated results of stellar evolution models, ephemeris 
elements, etc. . . 

3.2. Usage of astronomical catalogues 



Period 


Files 


Gbytes 


Nodes 


1993Jan- 


-1993Dec 


6106 


1.5 


458 


1994Jan- 


-1994Dec 


23696 


6.1 


1599 


1995Jan- 


-1995Dec 


57314 


11.4 


4022 


1996Jan- 


-1996Dec 


71300 


19.8 


4953 


19960ct- 


-1997Sep 


143000 


43.5 


6279 


19970ct- 


-1998Sep 


308840 


74.5 


9780 


19980ct- 


-1999Sep 


538407 


77.1 


10146 



Number of Nodes 


Catalogue designation and short title 


1999 


(1998) 




879 


(750) 


(1/239) Hipparcos & Tycho Catalogues 


502 


(123) 


(1/220) The HST Guide Star Catalog, VI. 1 






(Lasker+ 1992) 


293 


(165) 


(VI/87) Planetary Ephemerides (Chapront+ 1996) 


284 


(241) 


(I/131A) SAO Star Catalog J2000 (SAO Staff 






1966; USNO, ADC 1990) 


248 


(60) 


(1/197) Tycho Input Catalogue, Revised version 






(Egret+ 1992) 


203 


(221) 


(VII/118) NGC 2000.0 


195 


(162) 


(V/50) Bright Star Catalogue, 5th Revised Ed. 






(HofHeit+, 1991) 


173 


(145) 


(VI/80) Opacities from the Opacity Project 






(Seaton+, 1995) 


169 


(134) 


(1/246) The ACT Reference Catalog (Urban+ 






1997) 


126 


(142) 


(V/70A) Nearby Stars, Preliminary 3rd Version 






(Glicsc+ 1991) 


124 


(120) 


(VI/81) Planetary Solutions VSOP87 (Brc- 






tagnon+, 1988) 


112 


(73) 


(VII/207) Quasars and Active Galactic Nuclei (8th 






Ed.) (Veron+ 1998) 


102 


(134) 


(II/214A) Combined General Catalogue of Vari- 




able Stars (Kholopov+ 1998) 


101 


(76) 


(VII/155) Third Reference Cat. of Bright Galaxies 






(RC3) (do Vaucouleurs+ 1991) 


100 


(75) 


(VI/79) Lunar Solution ELP 2000-82B (Chapront- 






Touze+, 1988) 


99 


(153) 


(VI/69) Atomic Spectral Line List (Hirata+ 1995) 


97 


(149) 


(V/95) SKY2000 - Master Star Catalog (Mycrs+ 






1997) 


90 


(118) 


(1/196) Hipparcos Input Catalogue, Version 2 






(Turon+ 1993) 



Table 3. Yearly traffic on the CDS catalogue ftp server 
(internal and mirror traffic excluded) 



Table 4. Catalogues which have been the most frequently 
copied 



One of the main goals of the CDS is to promote the 
usage of the reliable astronomical catalogues to the astro- 
nomical community. The "Catalogue Service" has been 
one of the major CDS services since the beginning of the 
CDS activity, and used to distribute catalogues on mag- 
netic tapes and floppies; the service has been implemented 
on the network as a FTP server in March 1992, generating 
immediately a large increase in the number of distributed 
files. The FTP activity is still increasing at a high rate, as 
can be inferred from Table |3[ the current traffic is equiv- 
alent to a copy of the whole collection every month. 

It is also interesting to quote those catalogues which 
are the most frequently copied from the CDS archives, 
summarized in Table || for the las t two years: not surpris- 
ingly, surveys, and what Jaschek ( 1989 ), in his section 5.2, 
designates as General Compilation Catalogues, are among 
the most popular catalogues. It is also interesting to note 
the large number of copies of the GSC catalogue (about 
300 Mbytes): it was copied by over 500 nodes in the last 
12 months, which is 4 times more than in the previous 
year; this could indicate that catalogues of this size can 
be quite easily managed on small computers nowadays. 

4. Standardized Description of Astronomical 
Catalogues 

Making use of the data contained in a set of rapidly evolv- 
ing catalogues, as illustrated by Table raises the prob- 



lem of accessing and understanding accurately the param- 
eters contained in catalogues which are constantly im- 
proved. Typical questions to be addressed are: does the 
catalogue contain colours; if yes what is their reliability; 
are they expressed in a well-known standard system; are 
they taken from other publications or catalogues; how can 
the associated data file be processed? All these details 
which describe the data — the metadata — are tradition- 
ally presented in the introduction of the printed catalogue, 
or detailed in one or several published papers presenting 
and/or analyzing the catalogued data. 

Metadata play therefore a fundamental role: first the 
scientists have to get information about the environment 
of the data in order to make their judgement about the 
suitability of the data for their project, such as: date 
and/or method of acquisition, related publications, esti- 
mation of the internal and external errors, purpose of the 
data collection, etc.; but also a minimal knowledge of the 
metadata is required by the data processing system in or- 
der to merge or compare data from different origins - 
for instance, the comparison of data expressed in differ- 
ent units requires a unit-to-unit conversion which can be 
performed automatically only if the units are specified un- 
ambiguously. 

This need for a description which is readable both 
by a computer and by a scientist led to a standard- 
ized way of documenting astronomical catalogues and ta- 
bles, promoted by CDS from 1993 in the form of a dedi- 
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The Magellanic Catalogue of Stars - MACS (Tucholke+ 1996) 



The Magellanic Catalogue of Stars - MACS 

Tucholke H.-J., de Boer K.S., Seitter W.C. 
<Astron. Astrophys . Suppl. Ser., 119, 91-98 (1996)> 
<The Messenger 81, 20 (1995)> 
=1996A&AS. .119. . .91T =1995Msngr. .81. . .20D 



ADC_Keywords : Magellanic Clouds ; Positional data 
Description : 

The Magellanic Catalogue of Stars (MACS) is based on scans of ESO 
Schmidt plates and contains about 244,000 stars covering large areas 
around the LMC and the SMC. The limiting magnitude is B<16.5m and the 
positional accuracy is better than 0.5" for 997, of the stars. The 
stars of this catalogue were screened interactively to ascertain that 
they are undisturbed by close neighbours. 



File Summary: 



FileName Lrecl Records Explanations 



ReadMe 80 . This file 

lmc 52 175779 The Large Magellanic Cloud 

smc 52 67782 The Small Magellanic Cloud 



Byte-by-byte Description of file: lmc smc 



Bytes Format Units Label Explanations 



1- 


12 


A12 




MACS 


Designation 


14- 


15 


12 




h 


RAh 


Right Ascension J2000 , Epoch 1989.0 (hours) 


17- 


18 


12 




min 


RAm 


Right Ascension J2000 (minutes) 


20- 


25 


F6. 


3 


s 


RAs 


Right Ascension J2000 (seconds) 




27 


Al 






DE- 


Declination J2000 (sign) 


28- 


29 


12 




deg 


DEd 


Declination J2000 , Epoch 1989.0 (degrees) 


31- 


32 


12 




arcmin DEm 


Declination J2000 (minutes) 


34- 


38 


F5. 


2 


arcsec DEs 


Declination J2000 (seconds) 




40 


11 






Npos 


Number of positions used 


42- 


46 


F5. 


2 


mag 


Mag 


[]?=99.00 Instrumental Magnitude 

(to be used only in a relative sense) 




48 


11 






PosFlag 


[0/1] Position Flag (0: ok, 

1: internal error larger than 0.5") 




50 


11 






MagFlag 


[0/1] Magnitude Flag (0: ok, 

1: bad photometry or possible variable) 




52 


11 






BochumFlag * [0] Bochum Flag 



Note on BochumFlag: 1 if in Bochum catalog of astrophysical information 
on bright LMC stars (yet empty) 



Author's address: 

Hans- Joachim Tucholke <tucholke@astro.uni-bonn.de> 



(End) Hans-Joachim Tucholke [Univ. Bonn] 20-Nov-1995 

Fig. 1. Example of a documentation ReadMe file 



cated ReadMe file associated to each catalogue (Ochsenbein 
1994). An example of such a file is presented in Fig. |l|: it is 



a plain ascii file, quite easy to interpret for a scientist, and 
at the same time structured enough to be interpreted by a 
dedicated software. The ReadMe description file starts with 
a header specifying the basic references — title, authors, 
references — and contains a few key sections introduced 
by standard titles like Description: or Byte-by-byte 
Description of file:. Such a file is relatively easy to 
produce by someone who knows the catalogue contents. 
The example of Fig. |l| represents the documentation of a 
very simple catalogue, made of just two data tables, each 
with a small set of parameters. The output catalogue of 



the Hipparcos mission^] is an example of a much more com- 
plex catalogue: it is composed of two fundamental large 
tables (HIP with 10 5 stars and TYC with 10 6 stars) and 
includes a dozen of annex tables, but can still be described 
by the the same kind of simple standardized documenta- 
tion. 

The most important part of the ReadMe file is the 
Byte-by-byte Description which details the table 
structures in terms of formats, units, column naming or 
labels, existence of data (possibility of unspecified or null 
values), and brief explanations. Among the conventions, 
some fundamental parameters are assigned fixed labels like 



http: / / vizier.u-strasbg.fr / cgi-bin / Cat?I/239 
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sky coordinates (components of right ascension RA... and 
declination DE... in Fig. |l|); a prefix convention, detailed in 
Table ||, is also used to specify obvious relations between 
a value, its mean error, its origin, etc... 



Symbol 



Explanation 



a_label aperture used for parameter label 

E_label mean error (upper limit) on parameter label 

e^label mean error (tr) on parameter label 

±_label flag on parameter label 

1-label limit flag on parameter label 

m_label multiplicity index on parameter label to resolve am- 
biguities 

unlabel note (remark) on parameter label 

o_label number of observations on parameter label 

q_label quality on parameter label 

relabel reference (source) for parameter label 

unlabel uncertainty flag on parameter label 

w_label weight of parameter label 

X-label unit in which parameter label is expressed 

Table 5. Conventions used for label prefixes 



This standardized way of presenting the metadata pro- 
ved to be extremely useful, especially for data checking 
and format conversion: many errors were detected in old 
catalogues simply because a general checking mechanism 
became available. Tools have been developed for generat- 
ing a Fortran source code which loads the data into mem- 
ory, or for converting the data into the FITS format which 
is presently the most "universal" data format understood 
by data processing systems in astronomy — but unfortu- 
nately a data format which is not convenient outside this 



context (see e.g. Gr0sb0l et al. 1988). 

During the six years since this standardized way of 
describing astronomical catalogue has been defined, over 
2,600 astronomical catalogues have been described by 
means of this ReadMe file, and the same conventions have 
been adopted by the other astronomical data centers 
and journals for the electronic publication of tables. The 
present (October 1999) figures of the amount of standard- 
ized catalogues are summarized in the rightmost column 
of Table f§ previous figures were presented in an earlier 
paper (Ochsenbein 1997). 

It is expected, in the future, that the authors will sup- 
ply the documentation of their data in this simple form; 
it is already the case for a very significant fraction of the 
tables mailed to the CDS, and in order to help the au- 
thors, template files as well as a few tips on how to create 
the ReadMe file are accessible on the Web[| The ReadMe 
files and the data files are then checked by a specialist, 
who contacts the authors if errors are detected or when 
changes are necessary to increase the clarity or homogene- 
ity of the description. 



5. VizieR Organisation 

VizieR| is a natural extension of the usage of the meta- 
data stored in the ReadMe files, as an implementation of 
these metadata in terms of tables managed by a relational 
database management system (RDBMS). 

The first prototype of VizieR was the result of a fruit- 
ful collaboration between ESIS (European Space Informa- 
tion System, a project managed by ESRIN, a department 
of the European Space Agency) and the CDS; VizieR has 
been under full responsibility of CDS since January 1996. 
It was prese nted at the 1996 AAS meeting (Ochsenbein 
et al., 1996 ), and became fully operational in February 
1996. This prototype has been significantly upgraded in 
May 1997, just in time for the implementation of the fi- 
nal catalogues of the Hipparcos mission. The number of 
catalogues accessible within the VizieR system has grown 
since that time to 2,374 catalogues (Table ^). 

The core of VizieR consists in the organisation of the 
meta dictionary, i.e. the set of metadata extracted from 
the standardized ReadMe descriptions discussed in sec- 
tion ^. There are however two main problems which had 
to be solved: the access to very large catalogues (larger 
than a few million rows) for which RDBMS proved to be 
inefficient, requiring therefore dedicated search methods, 
and the generation of links allowing to connect two re- 
lated pieces of information, like other tables in the same 
catalog, or spectra, images from remote services, etc. 

5.1. MET A dictionary 



VizieR contents 
in terms of: 


All 

Catalogues 


Dealing with objects 
having positions 


Catalogues: 
Tables: 
Columns: 
Rows: 

(without megacatalogs) 


2374 
6071 
77260 
1.17 x 10 9 
40.3 x 10 6 


1247 
1929 
30261 
1.16 x 10 9 
31.6 x 10 6 



Table 

1999) 



6. Summary of the VizieR contents (November 



The meta-dictionary consists in 3 main tables detailed 
below, and about 20 annex tables, all stored in a relational 
database: 

1. METAcat describes the catalogues, a catalogue being 
defined as a set of related tables published together: 
typically a catalogue gathers a table of observations, 
a table of mean values, a table of references, a list of 
related images, etc. . . ; METAcat details the authors, 
reference, title, explanations of each stored catalogues. 
This table contains currently 2,374 rows (Table ^|). 



http:/ /vizier. u-strasbg.fr/doc/submit.htx 
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2. METAtab describes each data table stored in VizieR: 
table caption, number of rows, how to access the ac- 
tual data, the equinox and epoch of the coordinates, 
etc. . . This table contains currently 6,071 rows (Ta- 
ble ||) — i.e. the average catalogue is made of 2.6 ta- 
bles. 

3. METAcol details each of the 77,260 columns (Table |) 
currently stored in VizieR: column name or label, the 
textual explanation of the column contents, datatypes 
(numeric or character) and storage mode within the 
database (integer or floating-point, maximal length of 
strings, etc), units in which the data are stored in the 
data-base and units in which the data are presented 
to the user, edition formats, and a few flags used for 
searches (e.g. column used as primary key) or data 
presentation (e.g. column to be displayed in the de- 
fault presentation of the result). The average table is 
therefore made of ~ 12.7 columns — in fact ~ 11.7 
because each table contains an identification column 
in addition to the original set of columns. 

Note that, since the set of META tables is itself described 
in VizieR, the meta-dictionary can be viewed and queried 
like any of the catalogues stored in VizieR — allowing to 
locate easily e.g. tables with a large number of rows, or 
catalogues having the words mass loss in the description 
of one of their columns, etc. . . 

The annex tables of the meta-dictionary contain some 
definitions, like the list of known data-types (METAtypes) 
and keywords (METAkwdef ); or other details like the 
acronyms used to designate well-known catalogues like 
HIP, GSC ... (MET Aero), the keywords associated to 
each catalogue (METAkwd), detailed notes and remarks 
(METAnot), or the list of those objects which are indi- 
vidually quoted in the ReadMe files (METAobj). A spe- 
cial indexing scheme (METAcell), explained briefly in sec- 
tion |5.5| , was built to locate the existing objects in all cat- 
alogues in a single run. Details on how to generate links 
are stored in the METAmor table. 



5.2. Links in VizieR 

The interest of having a link, or an anchor 'in HTML terms, 
becomes obvious when a table contains a column repre- 
senting a reference to an original paper, as for example 
in Veron and Veron's compilation of quasars^: once the 
rules to transform the contents of this column into an ac- 
tual link to e.g. the ADS bibliographic service^ is set up, 
details about the authors and references, or even the full 
article, can then be displayed on the screen by a simple 
mouse click. Another frequent example is the possible ex- 
pansion of some footnote symbol into the lengthy note 
detailed in some other table. 



4 http:/ /vizier. u-strasbg.fr/cgi-bin/VizieR?- 
source=7207/tablel 



http: / / adswww.harvard.edu/ 



The links existing in VizieR may be classified in the 
following categories: 

1. hard-wired links which are part of the standard de- 
scription presented in section [|, like the existence of 
notes (stored in the METAnot table), or the r_ pre- 
fix (Table [^) which indicates a reference which may be 
detailed in a table of references; 

2. internal links which connect tables of the same cata- 
logue: such links may be expressed in terms of keys 
in the RDBMS terminology (definitions of columns 
as primary and/or foreign keys), by the existence of 
note flags, or by more complex relations stored in the 
METAmor table. Another type of internal link allows 
one to retrieve the spectra or images which are part of 
the catalogue, but which are stored as separate files. 

3. VizieR links which refer to another catalogue within 
the VizieR system; 

4. external links which refer to any other service, like bib- 
liographic services, external databases or archives, im- 
age servers, etc. 

While links of the first 3 categories can easily be main- 
tained, the maintenance of the external links depends on 
modifications which are completely outside VizieR's con- 
trol. These external links are maintained by the GL U sys- 
tem (Fernique et al., 199§| ), a system which (i) allows one 
to use symbolic names instead of hard-coded URLs, and 
(ii) translates these symbolic names with the help of a 
distributed dictionary in which the service providers keep 
up the descriptions of their own services only in terms of 
URL addresses and actual presentation of the query pa- 
rameters. 



5.3. VizieR feeding pipeline 

On the average, about one new catalog - or 2.6 tables 
- is added daily into VizieR. Such figures imposed the 
following constraints on the addition of new tables into 
VizieR: 

1. no human intervention is required to populate the 
database (the meta dictionary and the data tables): all 
meta-data related to a catalogue can be found or com- 
puted on the basis of documentation and configuration 
files which are read by the VizieR feeding pipe-line ; 

2. we rely as much as possible on the standardized de- 
scription of the catalogues presented in section ||: this 
means that the configuration file associated to each 
catalogue should be minimized, i.e. as few ad-hoc de- 
tails as possible should be needed besides the ReadMe 
files. 

The actual delay required to ingest a new catalogue 
into the system is currently estimated to something be- 
tween a few minutes and several days for the preparation 
of the ReadMe description file, depending on the initial 
presentation supplied by the authors and on the catalogue 
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complexity — the delay can be occasionally longer when 
problems are encountered, requiring interactions with the 
authors; and a few seconds up to an hour for the actual 
ingestion into VizieR from the standardized files. 

5.4- Access to Very Large Catalogues 



Acronym Rows Catalogue designation 

(xlO 6 ) 

USNO-A1.0 488.0 The USNO-A1.0 Catalog (Monet 
1997) 

USNO-A2.0 526.3 The USNO-A2.0 Catalog (Monet 
1998), calibrated against Tycho 
data 

GSC1.1 25.2 HST Guide Star Catalog, 1992 ver- 

sion 

GSC1.2 25.2 HST Guide Star Catalog, 1996 ver- 

sion 

GSC-ACT 25.2 HST Guide Star Catalog, calibrated 

against Tycho data^ 

2MASS 20.2 2fj,m All Sky Survey, Spring 1999 re- 

lease (Skrutskie et al., 1997) 

DENIS 17.5 Deep Near-IR S urvey first release 

(Epchtein et al., |l999j) 



dedicated program for accessing it. VizieR stores in its 



t calibration made by the Pluto project 
( bttp:// www. pro j ectpluto .com / gsc_act . htm ) 

Table 7. Large catalogues currently implemented in 
VizieR 



The second challenge is to open a fast access for query- 
ing the meg a- catalogues introduced in section ^. This de- 
nomination was somewhat arbitrarily assigned to cata- 
logues having 10 7 or more rows. Such large catalogues are 
essentially surveys used as reference catalogues, typically 
to find all objects detected in some region of the sky un- 
der some conditions of wavelength, time, object structure, 
etc. The set of such catalogues currently implemented is 
summarized in Table |7|, but this set will grow rapidly in 
the near future with the continuation of the infra-red sur- 
veys, and the emergence of surveys presently in prepara- 
tion (SLOAN, GSC-II, NVSS, . . . ). 

The limit of 10 7 rows corresponds to a limit in per- 
formance and time required to ingest the tables into the 
relational databases; the largest table, in terms of number 
of rows, currently stored in VizieR is the AC2000 catalog 
(Urban et al. 



1997), with 4.62 x 10 6 rows. 



The method used to access these very large catalogues 
consists in grouping the objects within carefully designed 
groups based essentially on the location in the sky, fol- 
lowed by a lossless compression obtained by replacing the 
actual values by offsets within the group; details about the 
actual results and performances are described in another 



META dictionary (see section 5.1) which program has to 
be called to actually access the catalogue, and the de- 
scription of the columns as they are returned from the 
dedicated program. 



5.5. Accessing all catalogues from a position in the sky 

In order to allow a fast answer to the question: find out 
all objects for all available catalogues around some target 
position, an indexing mechanism is necessary. The total 
number of object positions currently stored in VizieR, ex- 
cluding the megacatalogues, is about 32 x 10 6 (Table ^|); a 
classical indexation, in terms of relational DMBS, shows 
very poor performances especially in the updating phase: 
the addition of a new catalogue can require up to 4.6 mil- 
lions modifications or additions - which becomes dramat- 
ically slow. 

The method adopted for this indexation consists first 
in a mapping of the celestial coordinates into a set of boxes 
using a hierarchical spherical-cubic projection similar to 
the techique used by Simbad (Wenger et al., |2000 ), but 
down to a level 8 which corresponds to a granularity of 
about 20', or 6 x 4 8 (~ 4 x 10 5 ) individual boxes. The list 
of catalogues which exhibit sources in the region of the sky 
covered by the box is then stored for each of the defined 
boxes, allowing therefore a fast answer to the question: 
"what is the list of catalogues which have a fair chance of 
having at least one source close to a specified target ?" The 
final step consists in looking successively into the matching 
catalogues. 

The method offers the particularity of being hierarchi- 
cal: 6 boxes are defined at level 0, 24 at level 1, . . . , and go- 
ing down one step in the hierarchy consists in dividing each 
box into four parts. The indexing mechanism recursively 
groups contiguous non-empty boxes represented by a sin- 
gle box at the upper level, meaning that a dense survey 
covering the whole sky is just represented by the 6 boxes 
of level in this index. In practice, the 1247 catalogues 
with positions are summarized in this index by 3.9 x 10 6 



paper (Derriere & Ochsenbein, 1999). Each very large cat- 



alogue has presently its own organisation which depends 
on its actual column contents, and therefore requires a 



elements (to be compared to the 31.6 x 10 sources in 
Table ||), i.e. an average of 3,000 elements per catalogue. 



5.6. Current Contents 

The status of VizieR contents is presented in Table |[ 
where we distinguished those tables representing data 
about actual astronomical objects which can be accessed 
by their position in the sky. In terms of number of avail- 
able records, those containing celestial positions represent 
over 78% even when the megacatalogs are omitted, even 
though only 32% of the tables are concerned. In other 
words, the average table dealing with actual astronomical 
objects contains around 16,000 rows — a theoretical mean, 
as can be seen from the histogram of the table populations 
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Fig. 2. Histogram of the number of rows among VizieR 
tables (the darker bars correspond to tables containing 
celestial coordinates). 

in VizieR represented in Fig. || which shows a modal value 
around tables of 100 objects. 

6. VizieR Interfaces 

Several interfaces are currently available for an access to 
the data stored in VizieR: directly from a Web browser, via 
a construction of the query using the ASU conventions, or 
the developing XML interfaces. 

6.1. Access from a Browser 

From a WWW-browser, a "standard query" in VizieR con- 
sists in a few steps: 

1. Locate the interesting catalogues in the VizieR Ser- 
viced. This can be done in various ways illustrated in 
Fig. 3f from well-known catalogue acronyms like HIP 
or GSC, from a choice in the set of predefined key- 
words, from authors' names, or from a self-organizing 
(or Kohonen) map constructed on the basis of t he key - 
words attached to the catalogues (Poincot et al. 1998| ). 
New possibilities for locating catalogues of interest for 
the user are currently under development. 

2. Once a catalog table - or a small set of catalog tables 

— is located (for instance the Hipparcos Catalog^ re- 
sulting from the Hipparcos mission), constraints about 
what to search and how to present the results can be 
specified, as: 

— constraints based on the celestial coordinates, i.e. 
location in the neighbourhood of a target specified 
by its actual coordinates in the sky, or by one of 



6 http:/ /vizier. u-strasbg.fr/cgi-bin/ VizieR 

7 http:/ /vizier. u-strasbg. fr/cgi-bin/VizieR?- 
source=I / 239 /hip_main 



There are possibilities to aueiv VizieR from a LIST saved in a file : 
^ (No LIST) ^ (LIST of Targets) C (LIST of Constraints on specified parameter) 

1 . If you already know the catalogue abbreviation or numbering, enter it below: 



Submit Query | 



Reset Query | 



2. Alternatively, you can find the catalogue by selecting wavelength or mission or other 
keys 

o You may type in the box below either an author's name and/or word(s) from titl e_, 
description, etc. 



o You may also choose keywords related to Astronomy, Wavelength, or Mission: 



AGN 

Abundances 
Ages 

; Associations 
Atornic_Data 
BL_Lac_objects 
Binaries:cataclysmic 



Radio 




CGRO 


-1 


MR 




Copernicus 




optical 




EUVE 




UV 




EXOSAT 




EUV 




Einstein 




X-ray 




FAUST 




Gamma-ray 




GINGA 





Submit 



3. A Kohonen Self-Organizing Map, displaying all available catalogues, organized 
according to their associated keywords 




uvby 



Models H=ts 
Late* 

CLUSTERS 

QSOs Qp en Galaxie: 




This map is based on a neural 
network analysis of the keywords 
associated to the catalogues (see 
Lesteven et at, 
199BVA 40..395L1 

Each dot marks a map area; 
colour denotes the density or the 
clustering tendency of the 
documents; deep blue areas are 
empty. Just click any area on the 
map to get the corresponding list 



Fig. 3. Excerpt of the VizieR first search page 



its name as known in Simbad (see Wenger et al., 

poooj ) 

— any other constraint on any of the columns of the 
table(s), like a minimal flux value, or the actual 
existence of some parameter (non- NULL value) 

— which columns are to be displayed, and in which 
order the matching rows are to be presented. 

By pushing the appropriate buttons, it is for instance 
easy to get the list of Hipparcos stars closer than 5 par- 
sees to the Sun, ordered by their increasing distance^. 
Obtaining full details about one row is achieved by a 
mouse click in the first column of the result: for in- 
stance, the first row of the search for nearby stars de- 
scribed above leads to the VizieR Detailed Page with 
Hipparcos parameters and their explanations concern- 
ing Proxima CentauriFL 



8 http:/ /vizier. u-strasbg. fr/cgi-bin/ VizieR?- 
source=I/239/hip_main&-sort=-Plx&Plx=%3e= 

9 http:/ /vizier. u-strasbg. fr/cgi-bin/ VizieR-5?- 
source=I/239/hip_main&HIP=70890 
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4. Finally, there may be correlated data, like notes or 
remarks, references, etc. ... In our example, Proxima 
Centauri is related to the a Cen multiple star system, 
which components can be viewed from the link to the 
double and multiple stars (CCDM)0 that appears m 
the detailed page. 

The quantitative monthly usage of VizieR is presently 
(October 1999) about 40,000 external requests from 2700 
different nodes; mirror copies were installed recently in the 
USp] and in Japan^j in order to overcome the transconti- 
nental network congestions. 

6.2. The ASU protocol 

The uniform access to all catalogues is based on the 
so-called ASU^| (Astronomical Standardized URL) 
protocol resulting from discussions between sev- 
eral institutes (CDS, ESO, CADC, Vilspa, OAT). 
The basic concept of ASU is a standardized way of 
specifying queries to remote catalogues in terms of 
HTTP requests: the target catalogue is specified by 
a -source=catalog-designation parameter, the tar- 
get sky position by a -c=name-or_position,rm-ra- 
dius-in_arcmin parameter, the output format by 
-m±me=type, and general constraints on parameters 
by column_name= constraint. It should be noticed 
that the representation of a target by the name of an 
astronomical object (typically a star or galaxy name, e.g. 
3C 273) implies the usage of a name server converting a 
target name into a position in the sky, which is typically 
achieved by a call to Simbad. 

6.3. The XML Interface 

The output of a query to VizieR as presented in section |6.l| 
can hardly be used by an independent application for fur- 
ther data processing, such as the ALADiNp 1 ] visualisation 
tool (Bonnarel et al., 2000| ) which allows to superimpose 
the catalogued sources on top of actual image of the sky: 
the application requires an accurate interpretation of the 
catalogued output in terms of celestial positions in order to 
find out the exact location of each source. This means that 
Aladin has to figure out not only which are the columns 
representing the celestial coordinates, but also accurate 
definitions of the system used to express the coordinates, 
their accuracy, etc. . . — in other words the metadata about 
the celestial coordinates. 

XML (extensible Markup Language) is an emerging 
standard which allows to embed markup "tags" within 



10 http:/ /vizier. u-strasbg.fr/cgi-bin/VizieR-6?- 
source=1239&-corr=PK=CCDM&CCDM==14396-6050 

11 http:/ /adc. gsfc.nasa.gov/vizier/ 

12 http:/ /zl3. mtk.nao.ac.jp/vizier/ 

13 http:/ /vizier. u-strasbg.fr/doc/asu. html 
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a document; the key advantages of this language are that 
the same document can either be parsed by simple-minded 
programs (XML uses hierarchical structuring), or can be 
displayed in the new generation of browsers (via an XSL 
style sheet which maps the markup "tags " into typograph- 
ical specifications). This language presents other potential 
interests, especially regarding interoperability issues facil- 
itated by the emergence of generic tools able to process 
XML documents. 

The XML layout of astronomical tables was discussed 
extensively with interested collaborators, and the agreed 
definitions were presented at a recent ADASS meeting 
(Ochsenbein et al. 1999]). The output of VizieR is readily 
available in this format] 1 5 | currently used by the Aladin 
image applet; it is hoped that it will facilitate the usage 
of the astronomical data in new contexts. 



6.4. Current Developments 

With the large set of homogenized catalogues, VizieR plays 
a central role in a data-mining project currently in devel- 
opment as a collaboration of ESO and CDS, in two main 
directions: (i) make use of the VizieR large set of described 
columns (over 70,000 currently) to build up new meth- 
ods for locating the catalogues which are the best suited 
to a particular research topic; and (ii) develop automa- 
tized cross-correlation tools which can take into account 
the l argest possible set of meaningful parameters (Ortiz 
et al, |199SD. 



7. Conclusions 

VizieR is an illustration of the benefits resulting from an 
homogeneous documentation of the existing astronomi- 
cal catalogues, facilitating the transformation of a set of 
heterogeneous data into a fully interactive database, fur- 
thermore able to interact with remote services. The in- 
teroperability issues between the databases, in astronomy 
and problably in connected disciplines, will most likely 
be among the key developments necessary to allow the 
scientists to make use of the existing high-quality data 
whithout the prerequisite of being familiar with the data. 
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